Búsqueda | Portal Regional de la BVS

1.

Minimum information and guidelines for reporting a multiplexed assay of variant effect.

Claussnitzer, Melina; Parikh, Victoria N; Wagner, Alex H; Arbesfeld, Jeremy A; Bult, Carol J; Firth, Helen V; Muffley, Lara A; Nguyen Ba, Alex N; Riehle, Kevin; Roth, Frederick P; Tabet, Daniel; Bolognesi, Benedetta; Glazer, Andrew M; Rubin, Alan F.

Genome Biol ; 25(1): 100, 2024 Apr 19.

Artículo en Inglés | MEDLINE | ID: mdl-38641812

RESUMEN

Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.

Asunto(s)

Metadatos , Proyectos de Investigación , Reproducibilidad de los Resultados

2.

Insect detect: An open-source DIY camera trap for automated insect monitoring.

Sittinger, Maximilian; Uhler, Johannes; Pink, Maximilian; Herz, Annette.

PLoS One ; 19(4): e0295474, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38568922

RESUMEN

Insect monitoring is essential to design effective conservation strategies, which are indispensable to mitigate worldwide declines and biodiversity loss. For this purpose, traditional monitoring methods are widely established and can provide data with a high taxonomic resolution. However, processing of captured insect samples is often time-consuming and expensive, which limits the number of potential replicates. Automated monitoring methods can facilitate data collection at a higher spatiotemporal resolution with a comparatively lower effort and cost. Here, we present the Insect Detect DIY (do-it-yourself) camera trap for non-invasive automated monitoring of flower-visiting insects, which is based on low-cost off-the-shelf hardware components combined with open-source software. Custom trained deep learning models detect and track insects landing on an artificial flower platform in real time on-device and subsequently classify the cropped detections on a local computer. Field deployment of the solar-powered camera trap confirmed its resistance to high temperatures and humidity, which enables autonomous deployment during a whole season. On-device detection and tracking can estimate insect activity/abundance after metadata post-processing. Our insect classification model achieved a high top-1 accuracy on the test dataset and generalized well on a real-world dataset with captured insect images. The camera trap design and open-source software are highly customizable and can be adapted to different use cases. With custom trained detection and classification models, as well as accessible software programming, many possible applications surpassing our proposed deployment method can be realized.

Asunto(s)

Insectos , Programas Informáticos , Animales , Biodiversidad , Recolección de Datos , Metadatos

3.

Towards building a trustworthy pipeline integrating Neuroscience Gateway and Open Science Chain.

Sivagnanam, S; Yeu, S; Lin, K; Sakai, S; Garzon, F; Yoshimoto, K; Prantzalos, K; Upadhyaya, D P; Majumdar, A; Sahoo, S S; Lytton, W W.

Database (Oxford) ; 20242024 Apr 03.

Artículo en Inglés | MEDLINE | ID: mdl-38581360

RESUMEN

When the scientific dataset evolves or is reused in workflows creating derived datasets, the integrity of the dataset with its metadata information, including provenance, needs to be securely preserved while providing assurances that they are not accidentally or maliciously altered during the process. Providing a secure method to efficiently share and verify the data as well as metadata is essential for the reuse of the scientific data. The National Science Foundation (NSF) funded Open Science Chain (OSC) utilizes consortium blockchain to provide a cyberinfrastructure solution to maintain integrity of the provenance metadata for published datasets and provides a way to perform independent verification of the dataset while promoting reuse and reproducibility. The NSF- and National Institutes of Health (NIH)-funded Neuroscience Gateway (NSG) provides a freely available web portal that allows neuroscience researchers to execute computational data analysis pipeline on high performance computing resources. Combined, the OSC and NSG platforms form an efficient, integrated framework to automatically and securely preserve and verify the integrity of the artifacts used in research workflows while using the NSG platform. This paper presents the results of the first study that integrates OSC-NSG frameworks to track the provenance of neurophysiological signal data analysis to study brain network dynamics using the Neuro-Integrative Connectivity tool, which is deployed in the NSG platform. Database URL: https://www.opensciencechain.org.

Asunto(s)

Neurociencias , Publicaciones , Reproducibilidad de los Resultados , Bases de Datos Factuales , Metadatos

4.

Seek and you may (not) find: A multi-institutional analysis of where research data are shared.

Johnston, Lisa R; Hofelich Mohr, Alicia; Herndon, Joel; Taylor, Shawna; Carlson, Jake R; Ge, Lizhao; Moore, Jennifer; Petters, Jonathan; Kozlowski, Wendy; Hudson Vitale, Cynthia.

PLoS One ; 19(4): e0302426, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38662676

RESUMEN

Research data sharing has become an expected component of scientific research and scholarly publishing practice over the last few decades, due in part to requirements for federally funded research. As part of a larger effort to better understand the workflows and costs of public access to research data, this project conducted a high-level analysis of where academic research data is most frequently shared. To do this, we leveraged the DataCite and Crossref application programming interfaces (APIs) in search of Publisher field elements demonstrating which data repositories were utilized by researchers from six academic research institutions between 2012-2022. In addition, we also ran a preliminary analysis of the quality of the metadata associated with these published datasets, comparing the extent to which information was missing from metadata fields deemed important for public access to research data. Results show that the top 10 publishers accounted for 89.0% to 99.8% of the datasets connected with the institutions in our study. Known data repositories, including institutional data repositories hosted by those institutions, were initially lacking from our sample due to varying metadata standards and practices. We conclude that the metadata quality landscape for published research datasets is uneven; key information, such as author affiliation, is often incomplete or missing from source data repositories and aggregators. To enhance the findability, interoperability, accessibility, and reusability (FAIRness) of research data, we provide a set of concrete recommendations that repositories and data authors can take to improve scholarly metadata associated with shared datasets.

Asunto(s)

Difusión de la Información , Metadatos , Difusión de la Información/métodos , Humanos , Investigación Biomédica

5.

A minimal metadata set (MNMS) to repurpose nonclinical in vivo data for biomedical research.

Moresis, Anastasios; Restivo, Leonardo; Bromilow, Sophie; Flik, Gunnar; Rosati, Giorgio; Scorrano, Fabrizio; Tsoory, Michael; O'Connor, Eoin C; Gaburro, Stefano; Bannach-Brown, Alexandra.

Lab Anim (NY) ; 53(3): 67-79, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38438748

RESUMEN

Although biomedical research is experiencing a data explosion, the accumulation of vast quantities of data alone does not guarantee a primary objective for science: building upon existing knowledge. Data collected that lack appropriate metadata cannot be fully interrogated or integrated into new research projects, leading to wasted resources and missed opportunities for data repurposing. This issue is particularly acute for research using animals, where concerns regarding data reproducibility and ensuring animal welfare are paramount. Here, to address this problem, we propose a minimal metadata set (MNMS) designed to enable the repurposing of in vivo data. MNMS aligns with an existing validated guideline for reporting in vivo data (ARRIVE 2.0) and contributes to making in vivo data FAIR-compliant. Scenarios where MNMS should be implemented in diverse research environments are presented, highlighting opportunities and challenges for data repurposing at different scales. We conclude with a 'call for action' to key stakeholders in biomedical research to adopt and apply MNMS to accelerate both the advancement of knowledge and the betterment of animal welfare.

Asunto(s)

Investigación Biomédica , Metadatos , Animales , Reproducibilidad de los Resultados , Bienestar del Animal

6.

pyM2aia: Python interface for mass spectrometry imaging with focus on deep learning.

Cordes, Jonas; Enzlein, Thomas; Hopf, Carsten; Wolf, Ivo.

Bioinformatics ; 40(3)2024 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-38445753

RESUMEN

SUMMARY: Python is the most commonly used language for deep learning (DL). Existing Python packages for mass spectrometry imaging (MSI) data are not optimized for DL tasks. We, therefore, introduce pyM2aia, a Python package for MSI data analysis with a focus on memory-efficient handling, processing and convenient data-access for DL applications. pyM2aia provides interfaces to its parent application M2aia, which offers interactive capabilities for exploring and annotating MSI data in imzML format. pyM2aia utilizes the image input and output routines, data formats, and processing functions of M2aia, ensures data interchangeability, and enables the writing of readable and easy-to-maintain DL pipelines by providing batch generators for typical MSI data access strategies. We showcase the package in several examples, including imzML metadata parsing, signal processing, ion-image generation, and, in particular, DL model training and inference for spectrum-wise approaches, ion-image-based approaches, and approaches that use spectral and spatial information simultaneously. AVAILABILITY AND IMPLEMENTATION: Python package, code and examples are available at (https://m2aia.github.io/m2aia).

Asunto(s)

Aprendizaje Profundo , Programas Informáticos , Espectrometría de Masas/métodos , Lenguaje , Metadatos

7.

Estimating household contact matrices structure from easily collectable metadata.

Dall'Amico, Lorenzo; Kleynhans, Jackie; Gauvin, Laetitia; Tizzoni, Michele; Ozella, Laura; Makhasi, Mvuyo; Wolter, Nicole; Language, Brigitte; Wagner, Ryan G; Cohen, Cheryl; Tempia, Stefano; Cattuto, Ciro.

PLoS One ; 19(3): e0296810, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38483886

RESUMEN

Contact matrices are a commonly adopted data representation, used to develop compartmental models for epidemic spreading, accounting for the contact heterogeneities across age groups. Their estimation, however, is generally time and effort consuming and model-driven strategies to quantify the contacts are often needed. In this article we focus on household contact matrices, describing the contacts among the members of a family and develop a parametric model to describe them. This model combines demographic and easily quantifiable survey-based data and is tested on high resolution proximity data collected in two sites in South Africa. Given its simplicity and interpretability, we expect our method to be easily applied to other contexts as well and we identify relevant questions that need to be addressed during the data collection procedure.

Asunto(s)

Epidemias , Metadatos , Encuestas y Cuestionarios , Modelos Epidemiológicos , Sudáfrica , Trazado de Contacto/métodos

8.

Quantifying the global film festival circuit: Networks, diversity, and public value creation.

Zemaityte, Vejune; Karjus, Andres; Rohn, Ulrike; Schich, Maximilian; Ibrus, Indrek.

PLoS One ; 19(3): e0297404, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38446758

RESUMEN

Film festivals are a key component in the global film industry in terms of trendsetting, publicity, trade, and collaboration. We present an unprecedented analysis of the international film festival circuit, which has so far remained relatively understudied quantitatively, partly due to the limited availability of suitable data sets. We use large-scale data from the Cinando platform of the Cannes Film Market, widely used by industry professionals. We explicitly model festival events as a global network connected by shared films and quantify festivals as aggregates of the metadata of their showcased films. Importantly, we argue against using simple count distributions for discrete labels such as language or production country, as such categories are typically not equidistant. Rather, we propose embedding them in continuous latent vector spaces. We demonstrate how these "festival embeddings" provide insight into changes in programmed content over time, predict festival connections, and can be used to measure diversity in film festival programming across various cultural, social, and geographical variables-which all constitute an aspect of public value creation by film festivals. Our results provide a novel mapping of the film festival circuit between 2009-2021 (616 festivals, 31,989 unique films), highlighting festival types that occupy specific niches, diverse series, and those that evolve over time. We also discuss how these quantitative findings fit into media studies and research on public value creation by cultural industries. With festivals occupying a central position in the film industry, investigations into the data they generate hold opportunities for researchers to better understand industry dynamics and cultural impact, and for organizers, policymakers, and industry actors to make more informed, data-driven decisions. We hope our proposed methodological approach to festival data paves way for more comprehensive film festival studies and large-scale quantitative cultural event analytics in general.

Asunto(s)

Vacaciones y Feriados , Industrias , Geografía , Lenguaje , Metadatos

9.

Prediction of maximum scour depth at clear water conditions: Multivariate and robust comparative analysis between empirical equations and machine learning approaches using extensive reference metadata.

Nandi, Buddhadev; Patel, Gaurav; Das, Subhasish.

J Environ Manage ; 354: 120349, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38401497

RESUMEN

Flow obstructed by bridge piers can increase sediment transport leading to local scour. This local scour poses a risk to the stability of bridge structures, which could lead to structural failures. There are two main approaches for evaluating the scour depth (ds) of bridge piers. The first is based on understanding hydraulic phenomena and developing relationships with properties affecting scour. The second uses data-driven soft computing models that lack physical interpretations but rely on algorithms to predict outcomes. Methods are chosen by researchers based on their goals and resources. This study aims to create innovative ensemble frameworks comprising support vector machine for regression (SVMR), random forest regression (RFR), and reduced error pruning tree (REPTree) as base learners, alongside bagging regression tree (BRT) and stochastic gradient boosting (SGB) as meta learners. These ensembles were developed to analyse maximum scour depths (dsm) in clear water conditions, utilizing 35 literature's experimental data published in last 63 years. The performance of each machine learning (ML) approach was assessed using statistical performance indicators. The proposed model was also compared with top six empirical equations with strong predictive ability. Results show that among these empirical equations, the equation from Nandi and Das (2023) performs best. Performance evaluation considering training, testing, and the entire dataset, SGB (REPTree), BRT(SVMR-PUK), and SGB (REPTree) exhibited the highest performance, securing the top rank among all ML models and empirical equations. Sensitivity analysis identified sediment gradation and flow intensity as the most influential variables for predicting dsm during both training and testing phases, respectively.

Asunto(s)

Metadatos , Agua , Algoritmos , Aprendizaje Automático

10.

Utilization of electronic health record sex and gender demographic fields: a metadata and mixed methods analysis.

Foer, Dinah; Rubins, David M; Nguyen, Vi; McDowell, Alex; Quint, Meg; Kellaway, Mitchell; Reisner, Sari L; Zhou, Li; Bates, David W.

J Am Med Inform Assoc ; 31(4): 910-918, 2024 Apr 03.

Artículo en Inglés | MEDLINE | ID: mdl-38308819

RESUMEN

OBJECTIVES: Despite federally mandated collection of sex and gender demographics in the electronic health record (EHR), longitudinal assessments are lacking. We assessed sex and gender demographic field utilization using EHR metadata. MATERIALS AND METHODS: Patients ≥18 years of age in the Mass General Brigham health system with a first Legal Sex entry (registration requirement) between January 8, 2018 and January 1, 2022 were included in this retrospective study. Metadata for all sex and gender fields (Legal Sex, Sex Assigned at Birth [SAAB], Gender Identity) were quantified by completion rates, user types, and longitudinal change. A nested qualitative study of providers from specialties with high and low field use identified themes related to utilization. RESULTS: 1 576 120 patients met inclusion criteria: 100% had a Legal Sex, 20% a Gender Identity, and 19% a SAAB; 321 185 patients had field changes other than initial Legal Sex entry. About 2% of patients had a subsequent Legal Sex change, and 25% of those had ≥2 changes; 20% of patients had ≥1 update to Gender Identity and 19% to SAAB. Excluding the first Legal Sex entry, administrators made most changes (67%) across all fields, followed by patients (25%), providers (7.2%), and automated Health Level-7 (HL7) interface messages (0.7%). Provider utilization varied by subspecialty; themes related to systems barriers and personal perceptions were identified. DISCUSSION: Sex and gender demographic fields are primarily used by administrators and raise concern about data accuracy; provider use is heterogenous and lacking. Provider awareness of field availability and variable workflows may impede use. CONCLUSION: EHR metadata highlights areas for improvement of sex and gender field utilization.

Asunto(s)

Identidad de Género , Personas Transgénero , Recién Nacido , Humanos , Masculino , Femenino , Registros Electrónicos de Salud , Metadatos , Estudios Retrospectivos , Demografía

11.

ezBIDS: Guided standardization of neuroimaging data interoperable with major data archives and platforms.

Levitas, Daniel; Hayashi, Soichi; Vinci-Booher, Sophia; Heinsfeld, Anibal; Bhatia, Dheeraj; Lee, Nicholas; Galassi, Anthony; Niso, Guiomar; Pestilli, Franco.

Sci Data ; 11(1): 179, 2024 Feb 08.

Artículo en Inglés | MEDLINE | ID: mdl-38332144

RESUMEN

Data standardization promotes a common framework through which researchers can utilize others' data and is one of the leading methods neuroimaging researchers use to share and replicate findings. As of today, standardizing datasets requires technical expertise such as coding and knowledge of file formats. We present ezBIDS, a tool for converting neuroimaging data and associated metadata to the Brain Imaging Data Structure (BIDS) standard. ezBIDS contains four major features: (1) No installation or programming requirements. (2) Handling of both imaging and task events data and metadata. (3) Semi-automated inference and guidance for adherence to BIDS. (4) Multiple data management options: download BIDS data to local system, or transfer to OpenNeuro.org or to brainlife.io. In sum, ezBIDS requires neither coding proficiency nor knowledge of BIDS, and is the first BIDS tool to offer guided standardization, support for task events conversion, and interoperability with OpenNeuro.org and brainlife.io.

Asunto(s)

Metadatos , Neuroimagen , Presentación de Datos , Análisis de Datos

12.

The Importance, Challenges, and Possible Solutions for Sharing Proteomics Data While Safeguarding Individuals' Privacy.

Shome, Mahasish; MacKenzie, Tim M G; Subbareddy, Smitha R; Snyder, Michael P.

Mol Cell Proteomics ; 23(3): 100731, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38331191

RESUMEN

Proteomics data sharing has profound benefits at the individual level as well as at the community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regard to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by a lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Data sets that will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.

Asunto(s)

Privacidad , Proteómica , Humanos , Genómica , Metadatos , Difusión de la Información

13.

Enabling Data Discovery with the Astrobiology Resource Metadata Standard.

Wolfe, Shawn R; Lafuente, Barbara; Keller, Richard M; Detweiler, Angela M; Bristow, Thomas F; Parenteau, Mary N; Boydstun, Kevin; Dateo, Christopher E; Des Marais, David J; Jahnke, Linda L; Rojo, Sara; Stone, Nathan; Vorobets, Mark.

Astrobiology ; 24(2): 131-137, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38393827

RESUMEN

As scientific investigations increasingly adopt Open Science practices, reuse of data becomes paramount. However, despite decades of progress in internet search tools, finding relevant astrobiology datasets for an envisioned investigation remains challenging due to the precise and atypical needs of the astrobiology researcher. In response, we have developed the Astrobiology Resource Metadata Standard (ARMS), a metadata standard designed to uniformly describe astrobiology "resources," that is, virtually any product of astrobiology research. Those resources include datasets, physical samples, software (modeling codes and scripts), publications, websites, images, videos, presentations, and so on. ARMS has been formulated to describe astrobiology resources generated by individual scientists or smaller scientific teams, rather than larger mission teams who may be required to use more complex archival metadata schemes. In the following, we discuss the participatory development process, give an overview of the metadata standard, describe its current use in practice, and close with a discussion of additional possible uses and extensions.

Asunto(s)

Exobiología , Metadatos , Programas Informáticos

14.

Protocol for metadata and image collection at diabetic foot ulcer clinics: enabling research in wound analytics and deep learning.

Basiri, Reza; Manji, Karim; LeLievre, Philip M; Toole, John; Kim, Faith; Khan, Shehroz S; Popovic, Milos R.

Biomed Eng Online ; 23(1): 12, 2024 Jan 29.

Artículo en Inglés | MEDLINE | ID: mdl-38287324

RESUMEN

BACKGROUND: The escalating impact of diabetes and its complications, including diabetic foot ulcers (DFUs), presents global challenges in quality of life, economics, and resources, affecting around half a billion people. DFU healing is hindered by hyperglycemia-related issues and diverse diabetes-related physiological changes, necessitating ongoing personalized care. Artificial intelligence and clinical research strive to address these challenges by facilitating early detection and efficient treatments despite resource constraints. This study establishes a standardized framework for DFU data collection, introducing a dedicated case report form, a comprehensive dataset named Zivot with patient population clinical feature breakdowns and a baseline for DFU detection using this dataset and a UNet architecture. RESULTS: Following this protocol, we created the Zivot dataset consisting of 269 patients with active DFUs, and about 3700 RGB images and corresponding thermal and depth maps for the DFUs. The effectiveness of collecting a consistent and clean dataset was demonstrated using a bounding box prediction deep learning network that was constructed with EfficientNet as the feature extractor and UNet architecture. The network was trained on the Zivot dataset, and the evaluation metrics showed promising values of 0.79 and 0.86 for F1-score and mAP segmentation metrics. CONCLUSIONS: This work and the Zivot database offer a foundation for further exploration of holistic and multimodal approaches to DFU research.

Asunto(s)

Aprendizaje Profundo , Diabetes Mellitus , Pie Diabético , Humanos , Pie Diabético/diagnóstico , Inteligencia Artificial , Metadatos , Calidad de Vida

15.

TrackdAT, an acoustic telemetry metadata dataset to support aquatic animal tracking research.

Matley, Jordan K; Klinard, Natalie V; Martins, Ana Barbosa; Oakley-Cogan, Arun; Huveneers, Charlie; Vandergoot, Christopher S; Fisk, Aaron T.

Sci Data ; 11(1): 143, 2024 Jan 30.

Artículo en Inglés | MEDLINE | ID: mdl-38291027

RESUMEN

Data on the movement and space use of aquatic animals are crucial to understand complex interactions among biotic and abiotic components of ecosystems and facilitate effective conservation and management. Acoustic telemetry (AT) is a leading method for studying the movement ecology of aquatic animals worldwide, yet the ability to efficiently access study information from AT research is currently lacking, limiting advancements in its application. Here, we describe TrackdAT, an open-source metadata dataset where AT research parameters are catalogued to provide scientists, managers, and other stakeholders with the ability to efficiently identify and evaluate existing peer-reviewed research. Extracted metadata encompasses key information about biological and technical aspects of research, providing a comprehensive summary of existing AT research. TrackdAT currently hosts information from 2,412 journal articles published from 1969 to 2022 spanning 614 species and 380,289 tagged animals. TrackdAT has the potential to enable regional and global mobilization of knowledge, increased opportunities for collaboration, greater stakeholder engagement, and optimization of future ecological research.

Asunto(s)

Ecosistema , Metadatos , Telemetría , Animales , Acústica , Movimiento , Telemetría/métodos

16.

Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis.

Isasa, Imanol; Hernandez, Mikel; Epelde, Gorka; Londoño, Francisco; Beristain, Andoni; Larrea, Xabat; Alberdi, Ane; Bamidis, Panagiotis; Konstantinidis, Evdokimos.

BMC Med Inform Decis Mak ; 24(1): 27, 2024 Jan 30.

Artículo en Inglés | MEDLINE | ID: mdl-38291386

RESUMEN

BACKGROUND: Synthetic data is an emerging approach for addressing legal and regulatory concerns in biomedical research that deals with personal and clinical data, whether as a single tool or through its combination with other privacy enhancing technologies. Generating uncompromised synthetic data could significantly benefit external researchers performing secondary analyses by providing unlimited access to information while fulfilling pertinent regulations. However, the original data to be synthesized (e.g., data acquired in Living Labs) may consist of subjects' metadata (static) and a longitudinal component (set of time-dependent measurements), making it challenging to produce coherent synthetic counterparts. METHODS: Three synthetic time series generation approaches were defined and compared in this work: only generating the metadata and coupling it with the real time series from the original data (A1), generating both metadata and time series separately to join them afterwards (A2), and jointly generating both metadata and time series (A3). The comparative assessment of the three approaches was carried out using two different synthetic data generation models: the Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). The experiments were performed with three different healthcare-related longitudinal datasets: Treadmill Maximal Effort Test (TMET) measurements from the University of Malaga (1), a hypotension subset derived from the MIMIC-III v1.4 database (2), and a lifelogging dataset named PMData (3). RESULTS: Three pivotal dimensions were assessed on the generated synthetic data: resemblance to the original data (1), utility (2), and privacy level (3). The optimal approach fluctuates based on the assessed dimension and metric. CONCLUSION: The initial characteristics of the datasets to be synthesized play a crucial role in determining the best approach. Coupling synthetic metadata with real time series (A1), as well as jointly generating synthetic time series and metadata (A3), are both competitive methods, while separately generating time series and metadata (A2) appears to perform more poorly overall.

Asunto(s)

Metadatos , Privacidad , Humanos , Factores de Tiempo , Bases de Datos Factuales

17.

A web-based dashboard for RELION metadata visualization.

González-Rodríguez, Nayim; Areán-Ulloa, Emma; Fernández-Leiro, Rafael.

Acta Crystallogr D Struct Biol ; 80(Pt 2): 93-100, 2024 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-38265874

RESUMEN

Cryo-electron microscopy (cryo-EM) has witnessed radical progress in the past decade, driven by developments in hardware and software. While current software packages include processing pipelines that simplify the image-processing workflow, they do not prioritize the in-depth analysis of crucial metadata, limiting troubleshooting for challenging data sets. The widely used RELION software package lacks a graphical native representation of the underlying metadata. Here, two web-based tools are introduced: relion_live.py, which offers real-time feedback on data collection, aiding swift decision-making during data acquisition, and relion_analyse.py, a graphical interface to represent RELION projects by plotting essential metadata including interactive data filtration and analysis. A useful script for estimating ice thickness and data quality during movie pre-processing is also presented. These tools empower researchers to analyse data efficiently and allow informed decisions during data collection and processing.

Asunto(s)

Procesamiento de Imagen Asistido por Computador , Metadatos , Microscopía por Crioelectrón , Programas Informáticos , Internet

18.

The African Human Microbiome Portal: a public web portal of curated metagenomic metadata.

Kiran, Anmol; Hanachi, Mariem; Alsayed, Nihad; Fassatoui, Meriem; Oduaran, Ovokeraye H; Allali, Imane; Maslamoney, Suresh; Meintjes, Ayton; Zass, Lyndon; Rocha, Jorge Da; Kefi, Rym; Benkahla, Alia; Ghedira, Kais; Panji, Sumir; Mulder, Nicola; Fadlelmola, Faisal M; Souiai, Oussema.

Database (Oxford) ; 20242024 Jan 10.

Artículo en Inglés | MEDLINE | ID: mdl-38204360

RESUMEN

There is growing evidence that comprehensive and harmonized metadata are fundamental for effective public data reusability. However, it is often challenging to extract accurate metadata from public repositories. Of particular concern is the metagenomic data related to African individuals, which often omit important information about the particular features of these populations. As part of a collaborative consortium, H3ABioNet, we created a web portal, namely the African Human Microbiome Portal (AHMP), exclusively dedicated to metadata related to African human microbiome samples. Metadata were collected from various public repositories prior to cleaning, curation and harmonization according to a pre-established guideline and using ontology terms. These metadata sets can be accessed at https://microbiome.h3abionet.org/. This web portal is open access and offers an interactive visualization of 14 889 records from 70 bioprojects associated with 72 peer reviewed research articles. It also offers the ability to download harmonized metadata according to the user's applied filters. The AHMP thereby supports metadata search and retrieve operations, facilitating, thus, access to relevant studies linked to the African Human microbiome. Database URL: https://microbiome.h3abionet.org/.

Asunto(s)

Metadatos , Microbiota , Humanos , Metagenoma , Bases de Datos Factuales , Metagenómica , Microbiota/genética

19.

Mass spectrometry-based proteomics data from thousands of HeLa control samples.

Webel, Henry; Perez-Riverol, Yasset; Nielsen, Annelaura Bach; Rasmussen, Simon.

Sci Data ; 11(1): 112, 2024 Jan 23.

Artículo en Inglés | MEDLINE | ID: mdl-38263211

RESUMEN

Here we provide a curated, large scale, label free mass spectrometry-based proteomics data set derived from HeLa cell lines for general purpose machine learning and analysis. Data access and filtering is a tedious task, which takes up considerable amounts of time for researchers. Therefore we provide machine based metadata for easy selection and overview along the 7,444 raw files and MaxQuant search output. For convenience, we provide three filtered and aggregated development datasets on the protein groups, peptides and precursors level. Next to providing easy to access training data, we provide a SDRF file annotating each raw file with instrument settings allowing automated reprocessing. We encourage others to enlarge this data set by instrument runs of further HeLa samples from different machine types by providing our workflows and analysis scripts.

Asunto(s)

Células HeLa , Aprendizaje Automático , Proteómica , Humanos , Espectrometría de Masas , Metadatos

20.

FAIR+R: Making Clinical Data Reliable Through Qualitative Metadata.

Bönisch, Caroline; Kesztyüs, Dorothea; Kesztyüs, Tibor.

Stud Health Technol Inform ; 310: 99-103, 2024 Jan 25.

Artículo en Inglés | MEDLINE | ID: mdl-38269773

RESUMEN

Metadata are often the first access to data repositories for researchers within secondary use. Through automatic metadata generation and metadata harvesting the amount of data about data has been growing ever since. In order to make data not only FAIR but also reliable, the aspect of metadata quality has to be considered. But as earlier assessments of metadata of different repositories showed, metadata quality still lacks behind its capability. Providing an extensive literature review the authors conclude nine measures to assess metadata in relation to clinical care repositories, such as Medical Data Integration Centers (MeDICs). Proceeding from these measures the authors propose an addition of the FAIR Guiding Principles by adding a fifth block for Reliability including three principles, that resulted from the measures presented. The results form the basis for the future work of an assessment of metadata, that is stored in a MeDIC.

Asunto(s)

Hospitales , Metadatos , Humanos , Reproducibilidad de los Resultados , Investigadores

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA